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Abstract 

We introduce verifiable criteria for weak posterior consistency of identifiable Bayesian 
nonparametric inference for jump diffusions with unit diffusion coefficient and uniformly 
Lipschitz drift and jump coefficients in arbitrary dimension. The criteria are expressed in 
terms of coefficients of the SDEs describing the process, and do not depend on intractable 
quantities such as transition densities. We also show that products of discrete net and Dirich- 
let mixture model priors satisfy our conditions, again under an identifiability assumption. 
This generalises known results by incorporating jumps into previous work on unit diffusions 
with uniformly Lipschitz drift coefficients. 


1 Introduction 


Jump diffusions are a broad wide class of stochastic processes encompassing systems undergoing 
deterministic mean-field dynamics, microscopic diffusion and macroscopic jumps. In this paper 
we let X := (X. t )t >o denote a unit jump diffusion, which can be described as a solution to a 
stochastic differential equation of the form 


dX t = b(X t )dt + dW t + c(X t _, dZ t ) (1) 

on a domain 0 C M. d given an initial condition Xo = xo, coefficients b : Q, H > M. d and c : 
HxRqI-^ Mg, a d-dimensional Brownian motion (Wt)t>o and a pure jump Levy process (Zt)t>o 
on Mg := M. d \ {0} with Levy measure M(dz) satisfying 



2 A l)M(dz) < oo 


The notation || • || Pi p denotes the L p (p)-norm, where the Lebesgue measure is meant whenever 
the measure p is omitted. 
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Jump diffus ions are used as models across a broad spectrum of applications, such as economics 


and finance Merton 

1976, 

Aase and Guttorn. 

.987, Bardhan and Chao, 

.993, Chen and Filipovic, 

2005, 

Filipovic et al., 2007 

, biology Kallianpu 

r. 199j, 

Kallianour and Xiong, 1994], Bertoin and Le Gall, 

2003, 

Birkner et al.. 

2009] and engineering Au et al.. 

1982, Bodo et al.. 

1987]. They also con- 


tain many important families of stochastic processes as special cases, including diffusions and 
Levy processes. 


Remark 1 . In the exposition above, the processes X, W and Z all share a common dimension. 
This restriction is not necessary for any of the results in the paper, and has been introduced 
purely for readability of notation. 

Under regularity conditions summarised in the next section, jump diffusions are recurrent, er- 
godic Feller-Markov processes with transition densities pt(x, y)dy and a unique stationary den¬ 
sity 7T(x)cbc with respect to the d-dimensional Lebesgue measure. Under such conditions the 
procedure of Bayesian inference can be applied to infer the coefficients of the jump diffusion 
based on observations taken at discrete times. In this paper we focus on joint inference of the 
drift function b and the family of Levy measures v(x,dz) := M(c*(x,dz)), where c*(x, •) denotes 
the pull-back of c(x, •): 

c*(x, dz) := {y E U : c(x, y) E dz}. 

We abuse terminology and refer to the collection of measures v(x, •) as a Levy measure for the 
remainder of the paper. Inference of the Levy measure will refer to inference of v, assuming 
that neither c nor M is known. 


More precisely, let 0 denote a set of pairs (b,v), and let II denote a prior distribution on 
(©,£>(©)), where £>(©) is the Borel cr-algebra. Let xo :ri = (xo,x^,... ,xs n ) denote a time series 
of observations sampled from a stationary jump diffusion X at fixed separation 5. The object 
of interest is the posterior distribution, which can be expressed as 


n(A|x 0:n ) 


$ A ^Oh) nr=i pjTte-i, Xj)n(ri6, dv) 

fe tt &,z/ ( x o) dv) 


for measurable sets A E £>(©). In the Bayesian setting, the posterior encodes all the available 
information for inferential purposes. The restriction to unit diffusion coefficients implicit in dU) is 
a strong assumption in dimension d > 1, tho ugh some mo dels which fail to satisfy it outright can 
still be treated via the Lamperti transform Ait-Sahalia, 120081 ]. We will outline this procedure 
briefly in Section [2] 


A typical approach to practical Bayesian inference is to choose 0 comprised of parametric 
families of drift functions and Levy measures, and fit these parameters to data. However, the 
natural parameter spaces for jump diffusions are spaces of functions and measures, which are 
infinite dimensional and cannot be represented in terms of finitely many parameters without 
significant loss of modelling freedom. Nonparametric Bayesian inference can be thought of as 
inference of infinitely many parameters, and retains much of the modelling freedom inherent in 
the class of jump diffusions. 


A natural and central question is whether the Bayes procedure is consistent, that is, whether 
the posterior concentrates on a neighbourhood of the parameter space which specifies the “true” 
dynamics generating the data as the number of observations grows. If (b o,vq) E 0 denotes the 
data generating drift and Levy measure, consistency can be expressed as n (U£ „ fJ |xo : „) — > 0 as 
n —» oo, where Ub 0jU0 is an open neighbourhood of (bo, vq). 


Whether or not Bayesian posterior consistency holds in the nonparametric setting is an in- 
tric ate qu estio n, and depends on subtle ways on the prior n and the topology endowed on 
© Diaconis and Freedman], |1986|]. A further difficulty in the present context is the fact that 
stationary and transition densities of jump diffusions are intractable in practically all cases of 
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interest, so that usual conditions for posterior consistency are difficult to verify. These diffi¬ 
culties were recently overcome for discrete ly observed, one-dimensional un i t diff usions under 

and a mul- 
Both results rely 


restrictive conditions on the drift function van der Meulen and 


tidimensional generalisation was presented in Gugushvili and 


d van Zantenl . l20iaj| 
Spreiil . I 2 OI 4 I ]. Bott 


on martingale argumen ts developed by Ghosal and Tang for Markov p rocesses with tractable 
transition probabilities Ghosal and Tana . 120061 . iTang and Ghosall . 120071 ] . A Bayesian analysis of 


van der Meulen et al 


Panzar and van Zanten. 

2009, 

Pokern et al., 

20131 

and a review 

dimensional diffusions is provided bv van Zanten], 

2013]. Similar 


developments have also been made for fr equenti s t drif t estimation from d i screte observations , 
both for one dimensional unit diffusions 
and their multi-dimensional generalisations 


Jacod, 

2000, 

Gobet et al., 

2004, 

Comte et al., 

2007] 

s Dalalvan and Reifi, 

2007. Schmisseil. 

201 ± 


The main result of this paper is consistency of Bayesian nonparametric joint inference of drift 
functions and Levy measures in arbitrary dimension under verifiable conditions on the prior, 
given an identifiability ^assumption which seems difficult to verify in general. This generalises 
the result of Gugushvili and Spreij, 20141] by incorporating discontinuous processes with jumps. 
We also show that products of discrete net and Dirichlet process mixture distributions provide 
a class of priors for which our conditions hold. The key results enabling this gen eralisations 


are a generalised Girsanov-type change of measure theorem for jump diffusions ICh eridito et al 
2005 ] and a coupling method for establishing regularity of semigroups Wang . 2010| ] . 


The rest of the paper is organised as follows. In Section [2] we introduce the jump diffusion 
processes in finite dimensional domains and necessary regularity conditions. In Section [3] we 
define the inference problem under study, and state and prove the corresponding consistency 
result. In Section [I] we introduce the discrete net prior, and show that it satisfies our consistency 
conditions. Section [5] concludes with a discussion. 


2 Jump diffusions 


A general time-homogeneous, d-dimensional jump diffusion Y := (Yt)t>o is the solution of a 
stochastic differential equation of the form 

dY t = b(Y t )dt + a(Y t )dW t + c(Y t _,dZ t ), 

where a : P 1 —> M. dxd and the other coefficients are as in (El). The implicit assumption in (jT]) of 
a = 1 is restrictive in dimensions d > 1 . Processes which do not have unit diffusion coefficient 


can be dealt with provided they lie in the domain of the Lamperti transform A'lt-Sahalial . 
i.e. if there exists a mapping q : Y 1 —> X such that X is of the form (|TJ) . Such transforms 
exist for any non-degenerate process in one dimension, but only rarely in higher dimensions. 
Sufficient conditions for the Lamperti transfor m to be well defined are non-singularity of a and 
the following symmetry condition Yu, 2007 . Ai't-Sahalial . 120081 ]: 

<9(<7 -1 )i?( x ) _ d{a~ 1 ) ik (x) 


dx k 


dxi 


for all i, j, k G {1 ,... , d}. 


( 2 ) 


We note also that the Lamperti transform cannot be constructed from discrete data, so that in 
any case a must be known a priori. While restrictive, this assumption cannot be relaxed without 
fundamental changes to the method_ of proof o f co nsistency and already arises in the_simpler 
cas e of diffusions without jumps van der Meulen and van Zanten . 20 111 Gugushvili and Suren 


2014], 


The following proposition summarises the necessary regularity assumptions for existence and 
uniqueness of Feller-Markov jump diffusions with transition densities and a unique stationary 
density: 
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Proposition 1 . Assume that c(-,0) = 0, and that there exist constants C\ , Ci , C 3 , C 4 > 0 such 
that 


||&(x) - 6(y)||i + / ||c(x,z)-c(y,z)|||M(dz) < CiUx-ylll 
J Rg 

||c(x,z) — c(x,€)lli < ^llz-^lll 

For every x € Ft : ||x ||2 > C 3 the following holds: x • b(x) < — C 4 11x11 2 

/ ||z|||M(dz) < 00 . 

^ Mq : 11 z 11 2 > 1 


(3) 

(4) 

(5) 

( 6 ) 


T/ien H]) has a unique, ergodic weak solution X with the Feller and Markov properties. Fur¬ 
thermore, X has a unique stationary density 7 T b,u (x)dx with a finite second moment, and the 
associated semigroup P^' u has transition densities p b t ' v (x, y)dy. 

Proof. Existence and uniqueness of X are obtained from (EH) , as w ell as the linear growth bounds 
im plie d by Lijaschitz continuity, by Theorem 6.2.9 of Applebaum, 120041 ]. Theorem 6.4.6 of 


^ApplebaumL 2004] gives t he Markov property under the same conditions. Finally, the corollary 
in Appendix 1 of [Kolokoltsov . 2004 ] yields the Feller property. In turn, the Feller property and 
the fa ct that log(l + ll£lk]) ~ 1 ll£ | o 00 as ||£||2 —> 00 mean that the hypotheses of Theorem 


i 2 -r 00 as 114112 

1.1 of Schilling and Wang. 120lilt ] are fulfilled, so that X has bounded transition densities with 


respect to the Lebesgue measure. 

Existence and uniqueness of ir b,u , as well as ergodicity of X will follow from Theorem 2.1 of 


Masuda, 2007 ]. the hypotheses of which will now be verified. Along with c(-,0) = 0, conditions 


and dH) above imply Assumption 1 of Masuda. 120071 ]. Now, for every u € (0,1) let 


6 u (x) := 6 (x) — / c(x, z)M(dz). 

J 71<||z||i<l 


Assumption 2(a)’ of Masuda . 20091 ] requires X to admit bounded transition densities, and the 
diffusion which solves 

dX“ = b u (Xf)dt + a(Xf)dW t 

to be irreducible for each u > 0. Boundedness of the transition density of X was e stablished 
above, and irreducibility of X“ holds because cr = 1 by Theorem 2.3 of [St rainer and T weedid , 
19971] . 


Next we verify Assumptions 3 and 3* of Masudal . 120071 ] by checking the conditions of Lemma 2.4’ 
of Masuda . 20091] . The diffusion coefficient is constant, and hence o(||ai ||2 9 ^ 2 ) for any q e (0, 2). 
Condition ([S]) is the corresponding hypothesis of Masuda . 20091] . and both 1 1X11 o 2 x-fo(x 


—00 


and ||x || 2 2 x• 6 (x) < — C 4 follow from ([5]) . Hence, Assumptions 3 and 3* of Masuda. 12007 ] hold 


This yields ergodicity (and mixing) by Theorem 2.1 of Masuda. 120071 ], an d seco nd moments of 
the stationary distribution (and exponential mixing) by Theorem 2.2 of Masuda, [2007]. 


It remains to show the invariant measure has a density. By combining Proposition 5.1.9 and 
Theorem 5.1.8 of Fornaro, 2004 ] it can be seen that invariant measures of irreducible strong 
Feller processes are equivalen t to t he asso ciated transition probabilities, which is sufficient in 
this case. Assumption 1 of Masuda, 12007 ] and Ass umption 2(a)’ of Masuda, 20091 ] imply 
irreducibility of X (c.f. Claim 1 on page 42 of Ma suda . 120071 ]). Condition Q guarantees the 
strong Feller property by Theorem 2.3 of Wang, 12010 ], Hence the invariant measure has a 
density with respect to the transition densities, and thus also the Lebesgue measure. This 
concludes the proof. □ 

Remark 2. Assumption ([3]) is central to the proof of our main result. In contrast, assumptions 
gD, © and © are only needed to ensure the conclusions of Proposition [H as well as (fill and 
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m■ Whenever the prior II is supported by a set 0 for which versions of Proposition [I] (1101) and 
m can be established without ©, © and ©, these assumptions can be discarded without 
affecting our results. 

We denote the law of X with drift function b, Levy measure v and initial condition Xo = x 
by Px’" and the corresponding expectation by E^T. Dependence on initial conditions is omitted 
when the stationary process is meant. 


3 Consistency for discrete observations 


We begin by defining the topology a nd weak posterior consistency following the set up of 
van der Meulen and van Zantenl. 1201(11 ] . In addition to topological details, posterior consistency 
is highly sensitive to the support of the prior, which should not exclude the truth. This is guaran¬ 
teed by insisting that the prior places positive mass on all neighbourhoods of the truth, typically 
measured in terms of Kullback-Leibler divergence. In our setting such a support condition is 
provided by © below. 


We begin by setting out the necessary assumptions on the parameter space 0. 


Definition 1. Let 0 = {(&, v) : b : Q t—> v : Q x Mfj i->- M + } denote a set of pairs of 
drift functions 6 (x) and Levy measures u(x,dz) := M(c*(x, dz)) with each pair satisfying the 
hypotheses of Proposition [TJ Furthermore, suppose that there exists a constant C 5 > 0 such 
that 

M({z € D : 11z11 2 > 1}) < C*5 (7) 

uniformly in 0, and that for each x € D and any pair of Levy measures (-, zv), (-,z/) € 0 the 
measures z/(x, •) ~ z/(x, •) are equivalent with strictly positive, finite Radon-Nikodym density 
0 < < 00 , and that either 

1 . i/(x, ■) is a finite measure or 

2. there exists an open set A containing the origin such that zz(x, = i/(x, •)| J 4 . 


Remark 3. In effect, the conditions of Definition [T] mean that the unit diffusion coefficient and 
the infinite intensity component of the Levy measure can be thought of as known confounders 
of the joint inference problem for the drift function and the compound Poisson component of 
the Levy measure driving macroscopic jumps. In particular, one of conditions 1. or 2. is needed 
to ensure finiteness of the integrals CED and m 


The following assumption ensures that the drift function and Levy measure can be uniquely 
identified from discrete data. While it is crucial for our main result — posterior consis¬ 
tency as we will define it below does not even make sense for a non-identifiable inference 
problem — we h ave not been able to produce a tractable identifiability condition beyond 


that given in IGugushvili and Spreiil. 120141 ] for gradient-type diffusions. See also page 398 of 


Bladt and SeirensenL 120051 ] for a discussion on the challenge posed by identifiability in the sim¬ 


pler setting of discretely observed Markov processes with finite state spaces. 


Assumption 1. For II-almost any pair (6, v) 7 ^ (b', 1 /) S 0 there exists x £ O and / E D(G b,u ) 
such that P^ ,l/ /(x) 7 ^ Pg M /(x). In particular, identifying is equivalent to identifying (b, u). 
We emphasize that both x and / may depend on (6, u) and (b 1 , v’). 


The topology under consideration is defined as in van der Meulen and van Zanten . 20131 . Gugushvili and Spreii 
2014] by specifying a subbase determined by the semi groups P^ ,v . For details about the notion 


of a subbase, and other topological concepts, see e.g. Dudley . 20021 ]. 
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Definition 2. Fix a sampling interval 5 > 0 and a finite measure p € Mf(Q) with positive 
mass in all non-empty, open sets. For any (b, v) € 0, e > 0 and / € C'b(fi) define the set 


U% := {(&>') e 0 : HP/’"/ - P/7 Hi,p < £}• 






A weak topology on 0 is generated by requiring that the family {[/£’/ : / € Cb(Ll), e > 0, (6, z/) € 
0} is a subbase of the topology. 


The following lemma is a direct analogue of Lemma 3.2 of [van der Meulen and van Zantenl . 


2013]: 


Lemma 1. The topology generated by a subbase of sets of the form U b f” is Hausdorff. 


Proof. Consider (b,v) (b',v') € 0. By Assumption [T| there exists / € C(fi) and x S Li such 

that P b,u /(x) P b ,u /(x), and hence by continuity a nonempty open set J C Li where Pg' u f 
and Pjl ,u f differ. Hence ||-P/'7 — Pg f\\i, P > £ for some e > 0 so that the neighbourhoods 
Uy e/2 and are disjoint. □ 


We are now in a position to formally define posterior consistency, and state the main result of 
the paper. 


Definition 3. Let xo :n := (xo,...,x n ) denote n + 1 samples observed at sampling times 
0, 6, ...,6n from X at stationarity, i.e. with initial distribution Xo ~ 7r b ° ,i ' 0 . Weak posterior 
consistency holds if n (Uf Q v |xo :n ) —t 0 with F b °’^-probability 1 as n —>• oo, where Ub 0}UQ is any 
open neighbourhood of (froj^o) € 0. 


Theorem 1. Let xq :11 be as in Definition Q and suppose that the prior n is supported on a set 
0 which satisfies Assumption [7] and the conditions of Definition [7] with the constants C\ and 
C$ in j3|) and ([6]) holding uniformly in 0. If 


n f (b, v) € 0 : ^ ( ||&o ^11 2 , 7 r b o^o + 


dv o 


+ 


*1 


1 (o,i](\\ z h)zv(-,dz) 
vo(-,dz) 


l,n b o- v o 


2,ir b O’ l 'o 
< £ | > 0 


( 8 ) 


for any e > 0 and any (bo, uq) € 0, then weak posterior consistency holds for n on 0. 


Proof. We prove Theorem 1 by generalising the proof of Theorem 3.5 of [van der Meulen and van Zanten 


2013 ]. For (b,u) € © let KL(&o, z'o; b, u) denote the Kullback-Leibler divergence between p b s 0,l '° 

i b.v 

and pc : 

KL(6 0 , t'o; b, v) := [ [ log ( P \ ) p^(x,y)7r^(x)rfycfr, 

Jn Jn \ pf (x, y) J 

and for two probability measures P, P' on the same cr-field let K(P, P') := Ep [log (ftpr)] ■ The 
law of a random object Z under a probability measure P is denoted by C(Z\P). 

We require the following two properties: 


1. n((6, i/)£0: KL(&o, z'o! b, v) < e) > 0 for any e > 0. 

2. Uniform equicontinuity of the functions {Pg ,l> f ■ (b,v) € 0} for / € CfiLi), the set of 
bounded, continuous functions on LI. 
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These two properties will be established in Lemmas [21 and [51 below, which are the necessary 
generalisations of Lemmas 5.1 and A.l of van der Meulen and van Zantenl . 2013| . respectively. 

Lemma 2. Condition (jH]) implies that n((6, v) € 0 : KL(&o, r'o; b,v) < e) > 0 for any e > 0. 

Proof. As in Lemma 5.1 of jvan der Menlen and van Zantenl . 20131 ] it will be sufficient to bound 
KL(5 q, ^ 0 ; b, v) from above by a constant multiple of 


0 ( 11^0 ^ll2,7T b 0.^0 + I 

2 \ J K 

[ i°g(^(-,z) 

L V du 


dvo, \ 1 


l( 0 ,i](l|z|| 2 )z^(-,dz) 


2,ir b 0’”0 


+ 

A formal calculation yields 

Vo^^O^y)' 


dv o 
dv 


(•,z) + 1 


u 0 (-,dz) 


l,TT b 0’^0 


log 




7T 6 ^(x)p^( x , y) 


= + KL(b 0 ,v 0 -,b,v) = AT(£(X 0 , X <5 

< IP 60 ’" 0 ), ^((X t ) te[0 ,-«] IIP 6 ’*')) 


p^ I/0 (x,y)7r 6o ’^(x)dydx 

"°),£(X 0 ,X 5 |P^)) 


= K(n 


b Q) vo ^b,!; 


') +IE 60 


,V0 


log 


dr 


,bo,UO 


dF- 




X 0 


(9) 


by the conditional version of Jensen’s inequality. 


The aim is to identify the Radon-Nikodym derivative using Theorem 2.4 of Cheridito et al 


2 00511 . the hypot heses of which will now be verified. The local boundedness assumptions of 


Cheridito et all 120051 ] follow from Lipschitz continuity Q. Moreover, let (O n }^L 1 denote a 
sequence of bounded, open subsets of O such that fR C ID C ... and = Q. Then 

Lipschitz continuity, and the assumed finiteness of the Radon-Nikodym derivatives in Definition 
U] ensure that there exists a sequence of finite constants {Kn}^ = i such that 


sup 

xeQ n 


sup 

xgO n 


&o(x) - 6(x) - [ 
■Jr 

dv 0 


d 

■o L 


dv o, , ' 

^ (x ' z) - 1 


1 ( 0 ,l](l|z|| 2 )zi/(x,(fe) 


dv 


(x,z)log 


£<*,.,)-£(x,. ) + 1 


K n 

( 10 ) 

< L-n 

( 11 ) 


for ea ch (6, v) € © and each n € N. In part icular, the conditions i n Remark 2.5 of {Cheridito et al 


2005 ] are satisfied. Hence Theorem 2.4 of Cheridito et all 20051 ] holds, and the Radon-Nikodym 


derivative on the RHS of ([9]) can be expressed as K h(hV0 [\og(£(Ls))\. where £ is the Doleans-Dade 
stochastic exponential and the process L := (L t ) te [ 0 ,< 5 l is given as 


Li = 


fj 


dv o 
dv 


(X s _, z) — 1 


+ 


[ 6 0 (X S )-6(X S ) 
Jo 


Z"(X S _, dz, ds) — iz(X s _, dz)ds) 

(X s _, z) - l) l (0jl] (||z|| 2 )ziz(X s _, dz)dX.g, 


dv o 
IrJ. V dv 


where (X^) s >o is the continuous martingale part of X, i.e. a Brownian motion in this setting, 
and Z"(x, •) is a Poisson random measure with intensity zz(x, dz) <8> ds. Note that under W ,b °’ L, ° 

the process L is a local martingale, L c is a continuous local martingale with quadratic variation 


(L c ) t = f ||h 0 (X s ) - 6(X S ) - f (^(X,_,z) - l) 1 (0 , : 

»0 J M n x / 


|z|| 2 )zz/(X s _,dz) 


ds 
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and jump discontinuities of L can be written as 


A L, = 


dv o 
dv 


(X t _,AX t )-l 


l(o,oo)(l|AX 


t\\2) 


where AX^ denotes a jump discontinuity of X at time t. Now, the expected quadratic variation 
of ( L c ) t can be bounded by 

E b °’ u °[(L c ) t ] < [ E fe °^°[||6 0 (O) + 6(0) + 2CiX s + K\\l]ds 
Jo 

for some constant K > 0, using (|3j). the uniform upper and lower bounds on and the fact 
that either v and uq are equivalent and either both finite, or ^ = 1 on a neighbourhood of 
0 and v is finite on any open set not containing the origin. The stationary density has a first 
moment by Proposition [TJ so that E b °’ u °[(L c ) t \ < K't for some other constant K' > 0. Likewise, 




E AL 

t:||AX t |l2^0 


= / e 6o,i/ ° 




ds 


is finite due to the aforementioned conditions on vq and v. Thus L has expected quadratic 
variation 


E bo ’ vo [{L) t \ = E bo,1 '° 


AL‘fv(X s ,dz)ds + (L c 

i: || AXt || 2 


< oo 


for any t > 0, and is a true P”° ’ ,y ° - mart i ngale by Corollary 3 on page 73 of Proffer, [200 5], Then, 
the Radon-Nikodym term in Q can be written as 


_ 60,^0 


'dP x 0,I/ ° 


log Ur ((x * Wl) 


= E fe °’^[log (S(Lt))\ 


Ls — Lq — - (L c ) s + {log(l + A L t ) — A L t } 

(:AX ^0 


V dv 


(X t _,z) - 1 l (0il ](||z|| 2 )zi/(X t _,dz) 


dt 


0<t<(5:AXi^0 


+ E hog(A (Xl . ! AX 1 ))-(=(X 1 _,AX i )-l 


dv 


dv 0 


dv 


< 5 


' n| 6 o - 6 II 2 ,*) 1 (o,i](ll z lb) zi/ ('.<Jz) 

L { iog (^ ( '- z) ) ■ + 4 


2,7T°0> z/ 0 


l,7T b 0^0 


( 12 ) 


where the first equality follows from Theorem 2.4 of [Cheridito et ah|, 120051 ]. the second by 
definition of £ for jump diffusion processes, and the remainder of the calculation by stationarity 
and because vq is the compensator of the Poisson random measure driving the jumps of X under 
P 6 °^°. The result now follows from ([U]) and (fT2|). □ 

Lemma 3. For each 5 > 0 and f € C&(fi), the collection {P^' 1 ' f : (6, v) € 0} is locally uniformly 
equicontinuous: for any compact K e and e > 0 there exists 7 := 7 (e,K,f,5) > 0 such that 

sup sup \P/j’ u f ( x ) - Ps’ U f( y)| < e. 

(b,u)e& x,yg K: 

||x-y||2<7 





































Proof. Theorem 2.3 of Wang . 20ld ] establishes Lipschitz continuity for jump diffusions satisfying 
(J3]) using a coupling argument for / G Bh( O’), the set of bounded, measurable functions. We 
begin by showing that the conditions of Wang 2010] are satisfied. 

In our notation and setting, the condition of Theorem 2.3 of Wang. 120101 ] is that for some 
constant (3 G (0,1) there exists a constant Cp > 0 such that 


(1 + l|x - y|| 2 ) (ft(x) - b{ y), x - y) + i 1 + ll x ylk) Jj| z || 2 <i 


|c(x,z) - c(y,z)|||M(dz) 


||x — y ||2 2 ||x — y || 2 

+ (1 + ||x - y|| 2 ) f ||c(x,z) - c(y,z)|| 2 M(dz) + (1 T ||x — yl^C'g < 2 


whenever ||x — y|| 2 < /3 and where (■, •) denotes the usual Euclidean inner product. By ([2]), the 
first two summands on the LHS can be bounded by /3(1 + /3)y/Ci and /3(1 + j3)Ci/2, respectively. 
The fourth is trivially bounded by (1 + (3)Cp. By Jensen’s inequality, (HD and ©, the third term 
can be bounded by j3( 1 + f$)y/CiC§. Hence, the whole LHS can be bounded by 


(1 + P) 


Ci(l + ^)+7 + 


C, 


which can clearly be made arbitrarily small by choosing both (3 and Cp to be sufficiently small. 
This choice can be made uniformly due to the uniform bounds on the Lipschitz constant Cj and 
the total mass constraint C 5 . 


Now, the Lipschitz constant in Wang, 2010, Theorem 2.3] is of the form 

Since (3 and Cp can be chosen uniformly in 0, and A and ||/|| are constants, uniform equicon- 
tinuity holds. □ 

The remainder of the proof follows as in jvan der Meulen and van Zanten . 20131 ]. It suffices to 
show that for / G C&(f2) and B := {(6, u) € 0 : \\P^’^ f — P^ 0,v ° f\\x, P > e} we have n(P|xo :n ) —> 0 
with P 6 °-probability 1. To that e nd we fix f G Lip(H) and e > 0 and thus the set B. 
Lemma [2] implies that Lemma 5.2 of van der Meulen and van Zantenl . 2013 ] holds, so that if, 
for measurable subsets C n C 0, there exists c > 0 such that 


/ n 

7r 6,1/ (x 0 ) Y\_p b s ' v {xi-i,-x.i)U(db,du) -> 0 

-■n i=l 


,ty °-a.s. then nfCnlxn^,) —>■ 0 P feo,ty °- a.s. as well. Likewise, Lemma [3] implies Lemma 5.3 of 


van der Meu len and van Zant en. 120131 ]: there exists a compact subset K C H, N G N and 

hat cover K such that 

N N 

BcUs/uljB-, 


compact, connected sets I\..... /,v that cover K such that 

N N 


where 




3 =1 3 = 1 

B !--= 


G 0 : P, 6 ’7(x) - Pj ,0!i 7(x) > 

B J ■= 

| (b,v) 

G 0 : P, 6 ’7(x) - P^ 7(x) < 


Av{K) 
—£ 

4 u(K) 


for every xG/ju, 
for every x G Ij 
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Thus it is only necessary to show II(i3j l: |xo :n ) —> 0 P 6 °’^-almost surely. Define the stochastic 


process 


D n := 


IT 


IB J 


n \ 1/2 


i=i 




Now D n —> 0 e xponentially fast as n —» oo by an ar gument identical to that used to prove 
Theorem 3.5 of van der Meulen and van Zanten . 2013( |. The same is also true of the analogous 
stochastic process defined by integrating over B~, which completes the proof. □ 


4 An example prior 


The conditions of Theorem [T] are verifiable in the sense that they do not depend on intractable 
quantities (with the exception of Assumption [TJ, but it is not immediately clear whether a 
prior II satisfying its assumptions exists, in particular in the infinite dimensional setting. The 


assumptions: independent discrete net priors of 

Ghosal et a 

. 1997 

1 for b(-) and c(-, •), and a 

further, independent Dirichlet process mixture model prior 

Lol, 1984 

1 for M(-). Discrete net 

priors were also used in both 

van der Meulen and van Zanten 

2011 and 

Gugushvili and Sorcii 


2014] to demonstrate the existence of priors for nonparametric inference of drifts for diffusions. 


Firstly, let 0^ be a collection of uniformly Lipschitz functions from P to M d , each satisfying Q for 
some (not necessarily uniform) constants C 3 and C 4 . Let 0^ := ^ e ©b} the set of 

restriction in 0& to the closed ball of radius m centred at the origin. By uniform equicontinuity 
and the Arzela-Ascoli theorem, ©[(© is totally bounded in the uniform norm. Hence, for every 
n, it is possible to construct a finite e n -net over ©[ m \ where {£ n }?ieN is a sequence of 

strictly positive numbers tending to 0. In other words, 0^ m,n is a finite set with the property 
that every element of ©j ™' 1 is within distance e n of some element of in the supremum 

norm. Finally, every b G @[ m,n is extended to P by setting 6(x) = b( P Bo ^ x) — x + P-g^^yx 
outside Bo(rn). where P Bo ^ is the orthogonal projection onto Bo(m). Now, a discrete net prior 
is constructed by fixing two probability mass functions on N, {p m }meN and {g n } nG N, both of 
which assign positive mass to every positive integer. Then, a draw from the prior is generated 
by sampling m ~ p m and n ~ q n , followed by b\m, n ~ C/(©[” , ’ n ' ) ). Samples from this prior are 
bounded, uniformly Lipschitz continuous, and satisfy ([5]) by construction. 


Now let J Cl d bea fixed, compact domain including the origin, and let 0 C be a set of uniformly 
Lipschitz continuous functions c : P x J 1 —> J which satisfy the following: 


1 . c(-, 0 ) = 0 , 

2 . c(x, •) : J i->- J is a surjection for each x € P, 

3. for any c G 0 C , any x G P and z G Mg, there exists an open ball of strictly positive radius 
centred at z, B z (s), such that c(x, z) 7 ^ c(x, £) for any z/^G B z (s), 

4. for each c G 0 C there exists K c > 0 such that 


sup < sup{c(x, z)} 
tzEJ 


x: ll x l|2S 


= sup <sup{c(x, z)} 


<K C IzGJ 


and likewise for infirna. 


(13) 


Condition 2. guarantees that zz(x, dz) has a positive density everywhere whenever M(dz) does, 
while condition 3. rules out atoms in i/(x, dz). Let ©c := ( c l g 0 ( m ) 1 c € ©c} be the set of 
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restrictions of the first coordinate to the ball Bo(m) C 17, and let be a f n -net over 0^ 

for a strictly positive sequence e n \ 0. Each element of @c n,r ^ can again be extended to a 
function on the whole 17 x J by setting c(x, z) = c(P-^—x, z) outside Bo(m), where P Bo ^ 

denotes the orthogonal projection to Bo(m) as before. An independent discrete net can be used 
to define a prior for c(-, •), by specifying two probability mass functions {p m }meN and {^njneN, 
both assigning positive mass to all positive integers, and sampling draws analogously to the 
discrete net prior on 0^. 


Finally jtake the prior for the intensity measure M(-) to be a Dirichlet process mixture model 
U 1984 ], Let c/) T ( z) denote the d-dimensional centred Gaussian density with covariance matrix 
T~ 1 ^-dxd restricted to J, and renormalised to be a probability density. Let F be a probability 
measure on (0, oo) assigning posit ive mass to all non-empty open sets, and let DP(£) denote the 
law of a Dirichlet process Ferguson, 19731 ] with the mean measure ( G which is taken 

to be a probability measure with a finite first moment, independent of F. Let T>x(J) denote the 
space of continuous, positive densities on J with total mass at most T > 0. The Dirichlet process 
mixture model on T>y(J) with truncated Gaussian mixture kernel <^ T and mixing distribution 
i/(0, T) <8> .F <8> DP(£) is specified via the following sampling procedure: 


1. Sample P ~ DP(£). Then P is a discrete p robability measure on 


atoms with DP(£)-probability 1 
some fixed ordering. 


Ferguson . 197. 'll] . Let zi,Z 2 , 


l d with countably many 
denote these atoms in 


2. Sample IID copies r\, T 2 ,... ~ F. 

3. Sample a ~ {7(0, T). 

4. Set M(dz) = qE“i P(zj)^ Tj .(z — zj)dz. 

Note that samples are finite measures with strictly positive densities on J, which also means 
they have second moments because J is compact. 


Sampling all three components, &(•), c(-,-) and M(-) independently from the priors specified 
above yields draws which almost surely satisfy <[3J) by uniform Lipschitz continuity of b and c, 
as well as a uniform bound on the total mass of M: 


||6(x) - &(y)||l + ^ ||c(x,z) - c(y,z)|||Af(dz) 

< C&||x- y||| + J C c ||x - y\\ 2 2 M{dz) < {C b + TC c )||x - y|||. 

Condition ((Tj) is immediate from the uniform Lipschitz continuity of c, and (O holds by con¬ 
struction of the prior for b. The requirement that c(-,0) = 0 holds by construction. Finally, 
zz(x, g?z) = M(c*(x, dz)) is a hnite measure for each x because M is finite and c*(x, z) is a 
finite union of points by non-constancy, Lipschitz continuity and compactness of J. The Radon- 
Nikodym derivative ^ exists for the same reason, and is bounded both from above and away 
from 0 by compactness of J and (11311 . Thus, the conditions of Definition [T] are fulfilled. 

It remains to verify that ((SJ) holds for this product prior. This will be achieved by controlling 
the three 7r 6o,l '°-norms separately, and showing that samples which result in all three taking 
arbitrarily small values are drawn with positive probability. 

First, fix b o G 0;,, cq € 0 C and Mq G T>y(J), as well as e > 0, and define 

||6||m,oo := SUp H^x) ||oo- (14) 

|| x ||2<7n 
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Note that || • || m)OC is well defined for Lipschitz functions because they are locally bounded. Then 

ll&O - b \\l^b 0 ,, 0 < ll&O - b||m,oo + [ IIMx) - 6(x)|||7T feo ’ lyo (x)dx 

< ll&o - &llm,oo + [ ll&o(0) + &(0) + 2C'ix|||7r 6o ^°(x)(ix, 

J ||x|| 2 >m 

by Lipschitz continuity. Now, choose m to be large enough that the second term on the RHS is 
bounded above by e/8, which can be done because Tr b °’ u ° has second moments by Proposition [I] 
Likewise, the first term can be bounded by e/8 by choosing n large enough that e n < e/8. Note 
that by construction, the probability of sampling a corresponding function b from the prior is 
at least p m q n > 0, and that for such a b we have 

ll&O — &ll2,7r b 0^0 — = Vs/2- 


For the second norm, an elementary calculation using Jensen’s inequality and the fact that 
||z ||2 < 1 yields that 


dw, \ ' 

d ^- 1 


2,7T 6 0^0 


< 


< 


1 ( 0 .i](l|z|| 2 )z^(-,dz) 

f i D ou f ( ^o(x, z) 2 - 2zz 0 (x, z) I /(x, z) + i/(x, z) 2 \ b 

y^i/(x,R 0 (l)) Jj I - -) zz(x,dz)vr °’ °(x)dx 

/ 7T feo ’ t/ °(x)i/(x,Ro(l)) ( / ^(x,z)z/ 0 (x,dz) - iZ 0 (x,S 0 (l)) 

Jn \ J z:||z|| 2 <l dv 

+ |zz(x, B 0 (l)) - ^o(x, s 0 (l))|^ dx 

7r 6o ’ l/ °(x)l/(x,R 0 (l)) ( f "W~(x, z)t'o(x, dz) — ZZ 0 (x, i?o(l)) 

\ dz:||z||2<l dv 


' x:||x||2<m 


+ k(x,5o(l)) - ^o(x,#o(l))| dx 


+ / 7r b °’ t/ °(x)zz(x, J Bo(l)) 

J x:||x||2>m 


^(x,z)i/ 0 (x,dz) - zz 0 (x,R 0 (l)) 


+ H x , d?o(l)) - ^o(x, 5o(l))| dx. 


(15) 


Both z/(x, •) and zzo(x, •) are finite measures with Radon-Nikodym derivative bounded from above 
and away from 0. Thus, finiteness of 7 r b °’ 1 ' 0 ensures that the second integral on the RHS can be 
made arbitrarily small by choosing large enough m. Now consider 

Kx, B 0 (l)) - i*(x, £ 0 (1))| = |M(c*(x, fi 0 (l))) - M 0 (c^(x, B 0 (l)))| 

< |M(c*(x, B 0 {1))) - M(cg(x, B 0 (l)))| + |M(cg(x, S 0 (l))) - M 0 (c£(x, H 0 (l)))|, 

and note that if ||c — Co||m,oo < 7 i then 

c*(x,B 0 (l)) C Cq ^x, € J : y€ “rf ( 1 ) {ll£ “ yIU} < 7ij) ( 16 ) 

for any x : ||x|| 2 < m. The sets on the RHS are decreasing with decreasing 71 > 0 and of finite 
M-m ass, so that continuity of measure gives |M(c*(x, Bq(1))) — M(cq(x, Ho(1)))| < 72 for some 
72 , which decreases to 0 as 71 \ 0. Likewise, 

|M(c5(x,S 0 (l)))-Mo(c5(x,S 0 (l)))| < HM-Molloo. 
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Hence 

K x ,-B 0 (l)) - ^ 0 (x,H 0 (l))| < 72 + ||M - Molloo, 

which can be made arbitrarily small by first choosing a sufficiently large m, then a sufficiently 
small 72 as well as c : ||c — co|| m ,oo < 72 > and finally an M : \\M — Mo||oo < 73 for sufficiently 
small 73 . 

Similarly, (USD gives that 

dv 0 , = M 0 (cg(x,z)) < M 0 (c*(x, {$ g J : ||g - z||oo < 71 })) 

du ’ M(c*(x, z)) — M(c*(x, z)) 

for x : 11x11 2 < m whenever ||c— Co||rrx,oo < 7 i- Hence taking such a c, as well as M : \\M— Mo||oo < 
73 , and using continuity of measure yields the estimate 

/ x < M(c*(x, z)) +73 + 74 
di/ ’ — M(c*(x, z)) 


for some 74 \ 0 as 71 \ 0. The denominator is bounded from below, so that 

dv 0 , x . , , 73 + 74 

— (x,z) < 1 + —- n -vr, 

dv mf|| x || 2 < m) || z || 2 < 1 {z/(x,z)} 

which can be made arbitrarily close to 1 by choosing small enough 73 and 74 . An analogous 
lower bound follows by reversing the roles of v and z/o, and lower bounding the Radon-Nikodym 
derivative instead. Thus 


(1 - 73 - 74)z'o(x,£ 0 (1)) 




dur\ 


so that 


L 


z:||z|| 2 <l 


^(x,z)z^ 0 (x,dz) - i/ 0 (x,H 0 (l)) 


< 73 + 74- 


Taken together, the above bounds imply that (fT5l) can be bounded by y/e/2 by first choosing 
a large enough m, and then c and M such that ||c — co|| m ,oo < 7 and \\M — Mo|| m .oc < 7 for 
sufficiently small 7 > 0. Fix n such that e n < 7 . Then a suitable c is sampled from the prior 
with probability at least p m Q n > 0. The probability of sampling a suitable M is also positive, 
because by Theorem 1 of Bhattacharva and Dunson [2012 ], the support of the Dirichlet process 
mixture model is dense in T>y(J). 

The third norm in ([HD can be treated identically to the second, because x >->• log(x) is continuous 
and J is compact. Hence, its value is also bounded by e/2 with strictly positive n-probability. 
Thus, with positive n-probability 


r> I ll&O ^ 11 2,7T b 0 >^0 + 


*0 - 


dv 0 , x / 


*1 


1 (0,i](ll z ll2)z^(-,dz) 

z/ 0 (-,dz) 


2 ,n b 0’ l '0 


<i(VI + kP) +£ =s 

i,ir b O’ v o 2l 2 + 2 j + 2 ’ 


and hence 


holds. 
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5 Discussion 


In this paper we have shown that posterior consistency for identifiable, nonparametric Bayesian 
inference of drift and jump coefficients of jump diffusions from discrete data holds under criteria 
which can be readily checked in practice. Identifiability itself seems difficult to verify beyond the 
sco pe o f gradient-type diff usion s, for which it was established along with posterior consistency 
in [Gugushvili and Sored . 2011. Products of discrete net priors and Dirichlet process mixture 
models were shown to satisfy the conditions for consistency, provided that identifiability holds. 
It seems likely that in a case where identifiability fails but all other consistency criteria hold, 
the posterior will converge to be supported on the subset of pairs ( b , v) that give rise to the 
semigroup generating the data, with weights proportional to the prior densities of the pairs, at 
least subject to the set of these pairs being sufficiently regular. However, this conjecture has 
not been verified rigorously. 


Our results shares the limitation of [van der Meulen and van Zanteni . 2013, Gugushvili and Spreii, 
20141] of requiring priors to assign full mass to sets of functions for which the Lipschitz condition 
([3]) holds uniformly. This rules out many widely used families of priors, but counterexamples 
exist to show that without it, uniform equicontinuity fails even for one dimensional unit dif¬ 
fusions [Matthias Birkner, personal communication]. It seems clear that an entirely different 
approach is needed, if consistency results are to be established without a uniform equicontinuity 
condition. 


A further limitation of van der Meulen and van Zanten . 20131 . Gugushvili and Snreii . 20141 is 


pr 

that of being esta blished for a weak topology, for which the martingal e approach of [Walker, 


2004 


Liioi et all 120041 ] is well suited. A testing approach, such as that of [Ghosal and van der Vaartl . 
20071 ]. would yield convergence in a stronger topology as well as rates of convergence, but 
it is not clear how to adapt their results to the diffusion or jump diffusion settings. Cur- 
rently, results in this direct i on are only available for con t inuously observed s calar diffusions 


van der Meulen et al. . 20061 . Panzar and van Zanten . 200S . Pokern et all. 20131] . as well as dis 


cretely observed scalar diffusions on a compact interval Nickl and Sohl, [201 7], However, these 
rely, respectively, on the continuity of the observation and on a tractable representation of the 
stationary density, neither of which is available more generally. 

Practical implementation of inference algorithms is beyond the scope of this paper, but we 
note that a lgorithms based on exact simulation fo r jump diffusions are availa ble, at least in the 


scalar case Casella and Roberts! . 2011 . Goncalves . 2011 . Pollock et ah . 20171. Exact simulation 


Goncalves and Robert si . 20131 . Pollock . 2015 


of jump diffusions is an active area of research _____ 

Pollock et al. . 20161 ] and well suited for applications in Monte Carlo inference algorithms, with 
preliminary results in the co ntinuous diffusion setting ind i cating that nonparametric algorithms 


2012 . van Zanten . 20131 . [van der Meulen et ahl 


can be feasibly implemented jPa pa s pi liop oulos et al. 

201 4]. As a final remark, we note that presently such algorithms are only available for processes 


with jumps driven by compound Poisson processes of finite intensity, and with coefficients satis¬ 
fying regularity assumptions comparable to those in Proposition [l] Thus our Theorem [T] brings 
the theory on nonparametric posterior consistency in line with current state of the art algorithms 
in one dimension, and anticipates development of comparable methods in higher dimensions. 
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